Overview

NHANES (The National Health and Nutrition Examination Survey) was designed to assess the health and nutritional status of the US population and is conducted by the National Center for Health Statistics of the Centers for Disease Control and Prevention. Since 1999-2000, NHANES has been conducted in two-year cycles. For each cycle, potential participants are identified through stratified, multistage probability sampling of the non-institutionalized US population. In this set of exercises, we will use the ten cycles conducted from 1999-2000 through 2017-2018.

Import

Data dictionary

Review this briefly and use it as a reference to engage with the exercises below.

library(knitr)
library(kableExtra)


nhanes_descr <- c(
  "seqn"              = "SP identifier",
  "exam"              = "NHANES exam year",
  "psu"               = "primary sampling unit",
  "strata"            = "survey strata",
  "wts_mec_2yr"       = "survey weights",
  "exam_status"       = "How did SP engage with exam?",
  "age"               = "SP age, years",
  "age_group"         = "SP age group, years",
  "sex"               = "SP sex",
  "race_ethnicity"    = "SP race and/or ethnicity",
  "education"         = "SP education",
  "income_hh"         = "SP household income",
  "pregnant"          = "was SP pregnant at time of exam?",
  "bp_sys_mmhg"       = "SP systolic blood pressure, mm Hg",
  "bp_dia_mmhg"       = "SP diastolic blood pressure, mm Hg",
  "n_msr_sbp"         = "Number of valid systolic BP readings",
  "n_msr_dbp"         = "Number of valid diastolic BP readings",
  "bp_controlled"     = "Did SP have controlled BP? (<140/90 mm Hg)",
  "acr_mgg"           = "SP albumin-to-creatinine ratio, mg/g",
  "albuminuria"       = "Did SP have albuminuria? (ACR > 30 mg/g)",
  "chol_hdl_mgdl"     = "SP HDL-cholesterol, mg/dl",
  "chol_total_mgdl"   = "SP total cholesterol, mg/dl",
  "health_insurance"  = "SP health insurance status",
  "bp_high_aware"     = "SP ever told by Dr: 'you have high blood pressure'?",
  "bp_meds"           = "SP currently using antihypertensive medication?",
  "hc_usual_facility" = "SP has a usual healthcare facility?",
  "hc_visit_1yr"      = "SP visited their healthcare facility last year?"
)

abbrevs <- c("SP = survey participant", 
  "BP = blood pressure", 
  "HDL = high density lipoprotein")

enframe(nhanes_descr) %>% 
  kable(col.names = c('Variable', 'Description')) %>% 
  kable_styling(bootstrap_options = c('striped', 'hover')) %>% 
  footnote(general = paste(abbrevs, collapse = '; '))
Variable Description
seqn SP identifier
exam NHANES exam year
psu primary sampling unit
strata survey strata
wts_mec_2yr survey weights
exam_status How did SP engage with exam?
age SP age, years
age_group SP age group, years
sex SP sex
race_ethnicity SP race and/or ethnicity
education SP education
income_hh SP household income
pregnant was SP pregnant at time of exam?
bp_sys_mmhg SP systolic blood pressure, mm Hg
bp_dia_mmhg SP diastolic blood pressure, mm Hg
n_msr_sbp Number of valid systolic BP readings
n_msr_dbp Number of valid diastolic BP readings
bp_controlled Did SP have controlled BP? (<140/90 mm Hg)
acr_mgg SP albumin-to-creatinine ratio, mg/g
albuminuria Did SP have albuminuria? (ACR > 30 mg/g)
chol_hdl_mgdl SP HDL-cholesterol, mg/dl
chol_total_mgdl SP total cholesterol, mg/dl
health_insurance SP health insurance status
bp_high_aware SP ever told by Dr: ‘you have high blood pressure’?
bp_meds SP currently using antihypertensive medication?
hc_usual_facility SP has a usual healthcare facility?
hc_visit_1yr SP visited their healthcare facility last year?
Note:
SP = survey participant; BP = blood pressure; HDL = high density lipoprotein

Pre-requisites

Before starting these exercises, you should have a good understanding of

  1. The Isolating data with dplyr Primer.

  2. Chapter 5 of R for Data Science

Exercise 1

Suppose we are conducting a study and aim to include NHANES participants meeting the following conditions:

  1. Age 18+
  2. Completed the NHANES interview and examination
  3. Not pregnant at time of exam
  4. 1+ systolic and diastolic BP measurement
  5. Complete information on BP medication use.

Create a tibble with two columns:

  • inclusion: a numeric vector taking values 1, 2, 3, 4, and 5

  • description: a character vector comprising the inclusion criteria above.

Your solution should look like this:

Exercise 2

Apply the filter function to apply each inclusion criteria, in the same sequence as I listed earlier, to the NHANES data. Because we need to track how many participants were included after each step, I would recommend creating five separate datasets, like so:

Create a numeric vector of length 5 that contains the number of rows in each dataset you created (i.e., nrow(e1), …, nrow(e5)). Attach that vector to the tibble you created in exercise 1, and name it n.

Your solution should look like this:

Exercise 3

Apply kable() and kable_styling() to form a publication-ready table describing your inclusions.

Your solution should look like this:

Inclusion criteria No. of participants
Age 18+ 59204
Completed the NHANES interview and examination 56367
Not pregnant at time of exam 54779
1+ systolic and diastolic BP measurement 52007
Complete information on BP medication use. 51761
Note:
BP = blood pressure; No. = number

Exercise 4

Using your final sample of 51,761 NHANES participants, identify whether the highest value of systolic BP was provided by a male or female in each exam year. To do this, you should apply a grouped filter (i.e., use group_by() and then filter()).

  • The groups should be defined by exam.

  • The filter step should keep rows that are equal to the maximum systolic BP value.

  • A final step should apply the select function to keep only the exam, bp_sys_mmhg, and sex columns.

Your solution should look like this:

Do you notice anything peculiar? Try looking at the minimum systolic BP values instead. Do you notice any peculiar patterns here?


  1. University of Alabama at Birmingham,